Irregular transcription dynamics for rapid production of high-fidelity transcripts 
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Both genomic stability and sustenance of day-to-day life rely on efficient and accurate readout 
of the genetic code. Single-molecule experiments show that transcription and replication are highly 
stochastic and irregular processes, with the polymerases frequently pausing and even reversing direc- 
tion. While such behavior is recognized as stemming from a sophisticated proofreading mechanism 
during replication, the origin and functional significance of irregular transcription dynamics remain 
controversial. Here, we theoretically examine the implications of RNA polymerase backtracking 
and transcript cleavage on transcription rates and fidelity. We illustrate how an extended state 
space for backtracking provides entropic fidelity enhancements that, together with additional fi- 
delity checkpoints, can account for physiological error rates. To explore the competing demands 
of transcription fidelity, nucleotide triphosphate (NTP) consumption and transcription speed in a 
physiologically relevant setting, we establish an analytically framework for evaluating transcriptional 
performance at the level of extended sequences. Using this framework, we reveal a mechanism by 
which moderately irregular transcription results in astronomical gains in the rate at which extended 
high-fidelity transcripts can be produced under physiological conditions. 



As organisms evolved and diversified, more genes, 
longer genes and bigger genomes needed to be pro- 
cessed PP, with increased demands on fidelity. Central 
to fidelity in replication and transcription is that the 
four different NTPs posses different affinities for pair- 
ing with template nucleotides. This results in a pref- 
erence for forming proper Watson-Crick pairs 2J. Al- 
though substantial [3 , this selectivity is ultimately lim- 
ited by early and immutable evolutionary choices pertain- 
ing to the chemistry of nucleotides. To meet further de- 
mands for fidelity, both DNA and RNA polymerases have 
evolved proofreading mechanisms capable of removing 
errors which have already been incorporated into their 
growing polymer product. Through such mechanisms, 
replication reaches an error ratio (number of incorrect 
bases divided by the number of correct bases in the final 
transcript) of the order of 1/10 8 4J, while transcription 
achieves error ratios of the order of 1/10 5 [5]. In this 
paper we seek to provide a quantitative understanding 
of transcriptional proofreading and its consequences for 
nucleotide consumption and transcription speed. Due 
to the incomplete data concerning the microscopic rates 
for any individual type of polymerase, we here rely on 
the great structural homology among bacterial, eukary- 
otic, and archaeal polymerases to [5J [7] to infer order-of- 
magnitude estimates of transition rates between micro- 



scopic states for a generic polymerase. 

The theoretical underpinning of kinetic proofreading 
was established by Hopficld over 30 yeas ago [8]. How- 
ever, the standard treatment assumes the bases to be re- 
peatedly checked before being permanently incorporated 
into the growing transcript. This pre-incorporation se- 
lection (PIS) results in an ever growing transcript. With 
the event of single-molecule techniques, it is now well 
established that both RNA and DNA polymerases elon- 
gate their produce in a highly irregular manner: repeat- 
edly pausing, moving backwards, and cleaving bases from 
the growing molecule [MTB]. In fact, post-incorporation 
proofreading (PIP) has long been recognized to play a 
vital role in error suppression HH H3 [El [2H] , but 
has received little attention at a quantitative theoretical 
level [H]. 

We here use stochastic modeling to explore the down- 
stream effects of PIP in transcription, and the connection 
between proofreading, irregular transcription dynamics, 
and overall elongation performance. Our stochastic hop- 
ping model [2TJ [22] is built using structurally well char- 
acterized states, with transition rates measured in phys- 
iologically relevant settings. The model quantitatively 
couples chain elongation to the observed depolymerizing 
action of proofreading [SHU]. Through this we show that 
the highest error-suppression calculated within a stan- 
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dard Hopfield scheme corresponds to a pathological situ- 
ation with a net shortening of the transcript over time — a 
fact previously overlooked. This highlights the impor- 
tance of moving beyond considerations of fidelity alone 
if we are to gain even a qualitative understanding of this 
fundamental process. 

Proofreading must be efficient on a wide variety of 
genes, and we adopt a sequence-averaged view to identify 
a mechanism that works on generic sequences. Through 
this we are able to separate the dynamically generated 
heterogeneity from that of a static, sequence based, ori- 
gin. We show that the dynamics of an efficiently tran- 
scribing polymerase should be expected to be irregular — 
even before taking sequence effects into account. This 
suggests that a substantial part of the heterogeneous dy- 
namics seen in single-molecule experiments is function- 
ally advantageous and important for ensuring fidelity 

EBB Effing. 



I. MODELING ERROR SUPPRESSION 
THROUGH PIS AND PIP 

Thermal fluctuations are significant on the molecular 
scale, and we describe transcription as a stochastic hop- 
ping process between well defined states, with transition 
rates set by the intervening free-energy barriers [24] . Fol- 
lowing Hopfield [8j, we take the error suppression to be 
achieved through a sequence of serially connected energy- 
consuming, molecular-scale, and error-correcting check- 
points. The quality of a checkpoint is judged by its error 
fraction r, and the quality of several sequential check- 
points is given by the product of individual error fractions 
Ti ■ r% • 7*3 • . . . (see supplemental information) . 

Error suppression in transcription involves several 
checkpoints, divided into two classes: PIS and PIP [5]. 
Contrary to the situation for the DNA polymerase, both 
types of checkpoints are controlled by the same mul- 
tifunctional active region inside the RNA polymerase 
(RNAP) [25J [H]. The PIS process likely involves several 
steps [5] before the incoming NTP establishes the correct 
Watson-Crick base pairing with the DNA template, and 
catalyzes onto the growing RNA molecule [5J [STJ . As the 
states prior to catalysis are limited by the free-energy cost 
AG ac t of binding the wrong base to the template DNA 
strand within the polymerase, rpjg > exp(— AG act /k^T). 
From direct nucleotide discrimination studies rpis has 
been shown to be 1/10 3 — 1/10 2 [3J, corresponding to an 
average AG ac t ~ Qk-e.T- Utilizing PIS alone, sequences 
of no more then a few hundred base pairs (bp) can be 
reliably transcribed without errors. 

To increase fidelity past rpis, and be able to faithfully 
transcribe longer genes, RNAP has evolved the ability 
to proofread the transcript by selectively removing al- 
ready incorporated bases [5j HI [11] . The succesive action 
of both PIS and PIP is known to bring the combined 
error fraction rpigrpip down to around 1/10 5 [2"BW30| . 
From the estimates of the PIS efficiency mentioned above, 



we expect half of the error suppression to reside in PIP: 
r PIP = 1/10 3 — 1/10 2 . Lead by experimental results we 
now set out to quantitatively explain how this is achieved 
in a physiologically relevant setting through the use of 
extended, backtracked pauses. To highlight the benefits 
and implications of an extended backtracked state space, 
we first consider the case of only one backtracked state, 
and later contrast it to the case with the physiologically 
more relevant case of multiple states. 



A. Proofreading through backtracking 

It is well established that an erroneous base can 
be cleaved from the growing transcript once the poly- 
merase has entered what is known as a backtracked 
stateJSD] (see Figure [l]^): an off pathway state where 
the whole polymerase is displaced backward along the 
transcript [THUS]- Within the polymerase, the template 
DNA and nascent RNA strands form a 8-9 bp hybrid. As 
the polymerase shifts backward, this hybrid remains in 
register by breaking the last formed bond and reforming 
an old bond at the opposite end of the hybrid [21 [T21 
(see Figure^ and C). This exposes already incorporated 
bases to the active site, blocking further elongation but 
enabling cleavage of the most recently added base (cat- 
alyzed by the transcription factor IIS in eukaryotes and 
GreA and GreB in prokaryotes) [TTJ 1201 US [23 I53H55]. 
If cleaved, a potential error is removed, the active site is 
cleared, and elongation can resume. The cleavage pro- 
cess competes with the spontaneous recovery from the 
backtrack ,15], by which the polymerase returns to the 
elongation competent state without removing the poten- 
tial error (see Figure [Tp). In order for cleavage from the 
backtracked state to lower the error content, the cleavage 
reaction must select for erroneous bases. The inability of 
incorrectly matched bases to form proper Watson-Crick 
base paring within the RNA-DNA hybrid induces this 
selectivity. If an error has been catalyzed onto the 3'- 
end of the nascent RNA molecule, the total energy of 
the transcription complex is lowered if the RNAP moves 
into a backtrack (see Figure [lp). Doing this, the RNAP 
extrudes the unmatched base pair from the hybrid and 
so returns to the low energy state of a perfect Watson- 
Crick base-pairing within the entire hybrid (see Figure 
[It!). When the polymerase is in a backtracked state, the 
last added base is exposed to the active site and can be 
cleaved off. 

How much cleavage from backtracked states con- 
tributes to error suppression depends on the effect of mis- 
incorporations on the transition rates in and out of back- 
tracks. Specifically, the manner in which a missincorpo- 
ration effects the transition state to backtracking deter- 
mines if fidelity increases will be affected through an in- 
creased entrance rate into the backtrack (no shift of tran- 
sition state) or a lowered exit rate out of the backtrack 
(transition state shifts with the hybrid energy). For the 
latter case to have an appreciable proofreading capability, 



3 



A , NTP NTP NTP D 

... KNMP) n J^JfNMP^ L^ffNMP) n( \^». ... 

^ 11- *>L 11- 11- \ 

NMP Cgj] NMP fell NMP [ej] 



last base correct 




active *CT^ 
state ■0 AG " 



trans 



transcript N. 



cleavage fc c iv I 



AG ac 

transition Xs -i-^ transition 
to BT to cat. 



coding 
ssDNA 



A. active site 
t C base 
T I base 
_ reference 



last base extruded 
(backtracking) 

last base incorrect 

\ ff 



last base extruded 
(backtracking) 




FIG. 1: Single-state backtracking. A) The basic hop- 
ping model coupling one-step backtracking to elongation. The 
repetitive unit is highlighted, with the off-pathway back- 
tracked state indicated as BT. After entering a backtrack, 
elongation can resume either through cleavage out to a pre- 
vious state of the chain (NMP)„_i or by recovery without 
cleavage to the entrance state (NMP)„. B) Schematic illus- 
tration of the repeat unit with a correct base incorporated 
last. The template strand, the nascent transcript, and the 
hybrid region of the polymerase are shown. The polymerase 
can enter a backtrack with rate fcbt or add a base to the tran- 
script with rate fccat- From the backtracked state, recovery 
by cleavage occurs with rate fc c i v , while realigning without 
cleavage occurs at a rate k Icc . C) Same as B, but with an 
incorrect base at the growing 3 '-end of the transcript. The 
corresponding rates are indicated with the superscript I. D) 
Sketch of the free-energy landscape corresponding to B and C. 
Solid black line corresponds to the last base correct; dashed 
red line corresponds to the last base incorrect. AG ac t refers 
to the free-energy increase at the active site when the last 
incorporated base is wrong, while AG ca t denotes the corre- 
sponding increase in the barrier to catalysis (cat). Recovery 
without cleavage occurs at a rate fcbt, which places all selectiv- 
ity in the entrance step to the backtrack (see text). E) Three 
traces simulated with a Gillespie algorithm: a typical poly- 
merizing RNAP with (fc ca t = 10/s, febt = l/ s > ^civ = 0.1/s, 
see main text), a stalled polymerase (fc ca t = 1/s, fcbt = 10/9s, 
fcciv = 10/s), and a depolymerase (fc cat = 1/s, fcbt = 10/s, 
fcciv = 10/s). Traces are black when the polymerase is elon- 
gating, and red when backtracked. 



every single base must at some point be extruded out of 
the polymerase through backtracking, such that the base 
can be proofread and removed if it happen to be incor- 
rectly matched to the template strand. The required high 
backtracking frequency would render the polymerization 
process inefficient — even reverse it (see below) — which is 
clearly not what is observed in experiments [15, 36]. We 
thus take the selectivity to reside in the entrance step of 
the backtrack (see Figure |TjD) . For rates as illustrated in 



Figure [Tj3 and C, this corresponds to fc roc — kl cc = fcbt 
and fc bt = fcbt exp(AG ac t/fcBT) (rates corresponding to 
incorrect bases are denoted with the superscript I). We 
will simply refer to fcbt as the backtracking rate, and the 
resulting form of the free-energy landscape is illustrated 
in Figure [Tp. 



B. Physiological rate estimates 

Although single-molecule traces give us direct access 
to many of the individual rates introduced in Figure [Tj3 
and C, the spread even between individual enzymes of 
any specific type of polymerase is substantial [37J EH] - 
On top of this, not all rates are known for any one type 
of polymerase, so we are here content with relying on the 
structural homology between polymerases [HJ [7] and take 
in vitro rates from the different domains as represent- 
ing order of magnitude estimates of a generic enzyme. 
We use fc cat = 10/s [37J EH] (prokaryotic) [T5] (eukary- 
otic), backtracking rate fcbt = 1/s [39] (prokaryotic), and 
cleavage rate fc c i v = 0.1/s [TS] (eukaryotic) . Though this 
will not cover every scenario, the analytical nature of our 
work enables direct application of our results to other 
relevant situations. 

In a development largely parallel to the theory of ki- 
netic proofreading through PIS [8] , the error suppression 
of PIP can be calculated as (see supplemental informa- 
tion) 



:t +AG cat )/t B T ■ 



(1) 



Here AG ca t denotes the change in barrier height for the 
transition to catalysis when trying to incorporate a base 
directly after an error (see Figure [1)3). We can get an 
estimate of AG ca t from published experiments that use 
" non-hydrolizable" nucleotide substitutes. These sub- 
stitutes are thought not to influence binding affinities, 
but to change the catalysis rate to an extent comparable 
to that of an erroneous base [TT]. From this we esti- 
mate AG ca t ~ 2A;bT. For our typical polymerase this 
implies proofreading capabilities amounting to a modest 
rpip rs 1/30: off by an order of magnitude from the ex- 
perimentally determined fidelity (1/10 3 — 1/10 2 ). Note 
that the error ratio is insensitive to fcbt for our typi- 
cal polymerase (fc ca t ^ fcbt ^ fcciv)- Further, a com- 
parison of the regular traces (see Figure |TjEC) resulting 
from this model (see Figu re [T with those from single- 
molecule experiments [HI [17, 39 demonstrates that the 
model does not adequately capture the observed irreg- 
ular transcription dynamics (see also below). Although 
much of the observed dynamical heterogeneity has been 
attributed to structural heterogeneity through sequence 
specific pauses [UJ [37J |3D] , we here show that this is not 
necessarily the dominant contribution. 
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C. Entropic fidelity enhancements 

It is clear from Equation [l] that apart from increas- 
ing the energy penalty for a bad basepair, a low error- 
ratio can be achieved through a relative increase of the 
transcript cleavage rate compared to the elongation rate. 
Given their reverse arrangement (fc cat ~ 10/s ^> k c \ v ~ 
0.1/s), we speculate that the evolution of these rates 
has been strongly limited by external constraints per- 
taining to nucleotide chemistry and the intercellular en- 
vironment. To mediate these external constraints, the 
polymerase has had to find alternative internal paths to 
increase error suppression. 

One such internal path could be to reduce the free 
energy of the backtracked state. This would suppress 
spontaneous reversal of the backtrack and therethrough 
increase the probability of cleavage and error removal. 
Since a substantial part of the free energy relates to the 
energetics of base matching within the hybrid, the en- 
ergy level of the backtracked state is likely constrained 
by the structure of the hybrid — again presumably fixed 
by early evolutionary choices. However, nature appears 
to have come up with a different solution: an effective 
entropic reduction in the free-energy level of the back- 
tracked state is achieved by extending the number of ac- 
cessible states. RNAP is able to backtrack by more than 
just one base, and thermally move between the different 
backtracking states that are available 15, 23, (see 
Figure [2j\) . With N off-pathway and backtracked proof- 
reading states, the free energy associated with the back- 
tracked state would, in an equilibrium setting, be reduced 
by the entropic term k^T ln(iV). Even in our out of equi- 
librium setting this mechanism delays spontaneous recov- 
ery and raises the chance of cleavage and error removal 
(see supplemental information) . With an extended back- 
tracking space [5Tj. it is now clear from simulated traces 
(Figure [2p) that the irregular dynamics of our typical 
polymerase qualitatively matches the irregular dynamics 
observed in single- molecule experiments [T7J [37J EO] (see 
below for a quantitative assessment). By comparing the 
experimental effects of cleavage stimulating factors and 
simulated traces for increased cleavage rates, we provide 
further support of our kinetic scheme in the supplemental 
information. We also show that our model can capture 
the stalling dynamics of a polymerase as it transcribes 
against an increasing force |15j . 

When acting through extended backtracked states, the 
error suppression of PIP can be calculated as (see sup- 
plemental information) 

fccat /„\ 

rilPIP " fccat + V / » M e(AG a c t+ AG cat )/'cBT \ > 

_ k ca t 

^cat + fc bt e AG i^p/ fc Br' 

AG i :PIP — AG act + AG cat — -fcB7 1 ln(fcbt/fc c iv)- (3) 

Comparing Equation [2] to Equation [T] we see that fi- 
delity is increased by extending the space available for 




FIG. 2: Multi-state backtracking. A) The basic repeat 
unit of multi-state backtracking in a nested scheme. For visual 
clarity, only the backtracked states in the highlighted repeat 
unit are drawn. B) Sketch of the free-energy landscape of a 
multi-state backtrack. Solid black line corresponds to the last 
base correct; dashed red line corresponds to the last base in- 
correct. Also illustrated are the multiple backtracked states 
and the effect of cleavage. See caption to Figure [TJ3 for a de- 
scription of the rates. C) Three traces simulated with a Gille- 
spie algorithm: a typical polymerizing RNAP with (fc ca t = 10, 
fcbt = 1, fcciv = 0.1), a stalled complex (fc ca t = 10, febt = 10, 
fc c iv = 0.1), and a depolymerizing one (fc cat = 1, k ht — 10, 
fcciv = 0.1) — all in accordance to the theoretical predictions 
derived in the supplemental information. A section of the 
trace for our typical polymerase has been magnified, showing 
two backtracks, one rescued to elongation by cleavage and 
one by diffusion. Only the backtrack reentering elongation 
through cleavage would have corrected an error at the end 
of the transcript. Traces are black when the polymerase is 
elongating, and red when backtracked. 



backtracking: the low cleavage rate k c \ v is replaced by 
the geometric mean V^civ^bt- This increases the fidelity 
by about a factor of three for our typical polymerase, 
and provides an error reduction of T"i : prp ~ 1/100. The 
notation in Equation [3] is introduced to facilitate the ex- 
tension to several PIP checkpoints presented in the next 
section. The error suppression now depends on the ad- 
ditional parameter fcbt (c.f. Equation [I]) — a parameter 
independent of nucleotide chemistry and susceptible to 
change through evolutionary pressures. Although the ex- 
tension of the backtracking space does provide for fidelity 
enhancements, the total fidelity is still at the lower end of 
what is experimentally observed. However, our extended 
backtracking space gives further proofreading benefits by 
supplying the polymerase with additional inherent PIP 
checkpoints, as we now discuss. 
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transcription. The polymerase could in principle select 
and remove an error as long as it remains within the hy- 
brid. Intriguingly, the 8-9-bp hybrid might thus not only 
serve the purpose of stabilizing the ternary complex |45j , 
but also provide enhanced fidelity. 



FIG. 3: A second PIP checkpoint. The polymerase is 
expected to be sensitive to errors incorporated also next to 
last. The magnitude of the rates are illustrated by relative 
thickness of the transition arrows, bad base stackings are in- 
dicated in red. G indicates the free energy of the complex 
with respect to the elongation competent state. 



D. Second PIP checkpoint and beyond 

Even when additional bases have been added to the 
transcript after an erroneous incorporation, the error can 
in principle still be corrected through an extensive back- 
track and cleavage [21J . For this to lead to an appre- 
ciably increased likelihood of error removal, the random 
walk must be biased towards entering further into the 
backtrack. With an error at the penultimate 3 '-position 
of the transcript, the polymerase experiences such bias, 
since moving into a backtrack will eliminate a bad base- 
pair stacking within the hybrid (see Figure |3| . This is 
followed by another heavily biased step to completely 
extrude the error from the hybrid, making it amenable 
to cleavage. We know of no direct measurement of the 
penultimate bias AG2:PiPj but as the typical stacking en- 
ergy in a RNA-DNA hybrid is 1.5-4.5 k^T [2] we assume 
AG2:PiP ~ 3/cbT 1 . This second PIP checkpoint provides 
an error ratio (see supplemental information) of 
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AG 2 
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(4) 



For our typical polymerase r2 : pip ~ 1/3, and the total 
PIP induced error reduction rpip = ri : pipr2 : pip ~ 1/300 
falls well within the experimentally observed range. 

The suggested scheme thus quantitatively accounts for 
the typically observed error-suppression, but there could 
in principle be additional inherent PIP checkpoints that 
would enable the polymerase to reach even higher fideli- 
ties. An increasing free-energy penalty for moving the 
error further into the hybrid would incur a longer range 
bias for backtracking, and additional fidelity gains ac- 
cording to (see supplemental information) 



rpip 
r n -.pip 



TPIP^iPIP • • • r n:PIP ' ' ■ 



&cat + fcbte 



AG n:P ip/fc B T- 



(5) 



Based on structural considerations of base pairing 
within the RNA-DNA hybrid, we conclude that PIP- 
proofreading of RNAP includes at least two serial check- 
points that account for the typical fidelities observed in 



E. Power-law pause distributions and spatial 
heterogeneity 

We next illustrate the consequences of the proofread- 
ing states on pause duration and frequency. To this end 
we simulate our typical polymerase transcribing a long 
sequence and compare it to a simulation of an other- 
wise identical polymerase, but which has PIP turned off 
(fcbt = 0/s). In Figure [4^ we show a particular re- 
alization (of our generic polymerase) of incorporation 
errors (only PIS in red) together with the errors left 
after the section has been proofread (PIS and PIP in 
black). The fidelity enhancements are clearly visible, 
but they come at the cost of both an decreased veloc- 
ity, as well as an increased spatial heterogeneity. These 
effects are qualitatively visible already at the level of in- 
dividual traces, but are quantitatively best seen in the 
changes of the dwell-time distribution (see Figure |4j3) 
or in the transition-rate (inverse dwell-time) distribution 
(see Figure [4]G) . In the dwell-time distribution, proof- 
reading introduces a power-law regime, throughout which 
the probability of a long pause falls off with duration 
t as t~ 3 / 2 [39] . until it drops off exponentially beyond 
t ~ 1/fcciv In Figure [4j3 we see a clear exponential be- 
havior of the dwell-time distribution for both processes 
at around t ~ l/fc c i = 0.1s, while the proofreading poly- 
merase also has the above mentioned power-law decay 
extending out to t ~ l/fc c iv = 10s. Similarly, consider- 
ing the transition-rate distributions we see a narrow but 
significant low velocity peak develop around the transi- 
tion rate~ k c \ v = 0.1/s, diminishing the bare elongation 
peak situated around the rate~ fc ca t = 10/s (see Fig- 
ure |4p). To further elucidate the effects of the power-law 
regime, we consider another important observable: the 
pause-time distribution, or the total time a polymerase 
spends at each position along the DNA molecule. In Fig- 
ure |4fD we show pause density plots along a sequence of 
500 bp, with darker bands indicating longer total time 
spent at that position during the transcription process. 
Comparing transcription with and without PIP it is clear 
that PIP leads to greater spatial heterogeneity, exhibiting 
distinct regions of markedly increased occupation density 
even where there are no incorporation errors. Thus, our 
model accounts for both the observed spatial heterogene- 
ity as well as the broad pause-time distributions [ISl 133 
without the need to introduce additional assumptions 
about the effects of sequence heterogeneity [HI [39] . 

Having shown that external constraints can be medi- 
ated through accessing an internal extended backtracked 
space — resulting in irregular transcription dynamics — we 
now turn our attention to the specific level of irregu- 
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larity observed in experiments. Irregularity is tuned by 
the backtracking rate, and considering that increasing 
fcbt would render all proofreading checkpoints more ef- 
fective (see Equation [5]) one might wonder why the back- 
tracking rate is kept moderate (1/s) and not made much 
larger [361139]. 
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FIG. 4: The effects of proofreading. A) On top we show a 
realization of incorporation errors according to our free-energy 
estimates (only PIS in red), and below we show the errors 
that survive (or, possibly, are inserted by) the proofreading 
mechanisms (PIS and PIP in black). B) The dwell-time dis- 
tribution from a process without proofreading and one with 
proofreading. Proofreading gives rise to a power-law regime 
significantly increasing the fraction of long-pauses. C) The 
transition-rate (inverse dwell-time) distribution for the same 
processes as in B, where the effects of proofreading can be seen 
through a shift from a unimodal to a bimodal distribution as 
many excessively slow transitions involving backtracks start 
influencing the kinetics. D) The pause density, or the total 
occupation time plotted for a 500 bp sequence transcribed by 
the same two polymerases as used in B and C. The darker the 
bands, the longer the total occupation time at that position. 
The scales are individually normalized to cover the range of 
occupation times for each polymerase. Two incorporation er- 
rors are indicated with red markers. 



II. TRANSCRIPTION PERFORMANCE 

We have here suggested that by utilizing extended 
backtracked states, the polymerase has overcome exter- 
nal constraints to suppress errors. This introduces the 
backtracking rate fcbt as a variable susceptible to evolu- 
tionary pressures. In order to understand the underly- 
ing reasons for why the backtracking rate is kept moder- 
ate, we now consider the phenotypic space made available 
through the extended backtracking space. The quantities 
needed to access polymerase performance — as it varies 
with the level of PIP — are calculated in the supplemental 



information by using continuous time random walk the- 
ory |46j . Starting with instantaneous transcriptional effi- 
ciency measures on the level of the individual base pairs, 
we then consider the efficiency on extended sequences or 
genes. Importantly, we investigate how much faster the 
polymerase can produce perfect transcripts of extended 
sequences with PIP as compared to without PIP. 



A. Performance on the level of a base pair 

We are interested in the effective elongation rate, and 
thus calculate the average elongation rate 1 /t c \ (see sup- 
plemental information) . Since there is only about one er- 
ror passing through the PIS checkpoint every 500 bases, 
we can ignore the effect of errors on the overall elonga- 
tion dynamics. We now construct the efficiency measure 

Veh 



Tjel 



1/tc 



1 - fcbt/fcc 



1/2+ y/l/4 + kbt/k, 



(6) 



lv 



which describes the relative slowdown due to PIP. With 
no PIP (fcbt = 0) the efficiency is appropriately ?y c i = 1, 
while it vanishes at the transition between polymeriza- 
tion and depolymerization fc cat = kbt- At this point, 
elongation stops proceeding with a well defined veloc- 
ity, and behaves diffusively on large lengthscales. For 
fccat < &t>t net depolymerization sets in. This situation is 
pathological, and shows that backtracking cannot dom- 
inate the dynamics even though this would be judged 
optimal in terms of fidelity calculated within the Hop- 
field kinetic proofreading scheme. The transition to non- 
functional polymerases can be seen in the single-molecule 
transcription traces presented in 52] |15j , and in the sim- 
ulated traces presented in Figure 2p (see also supplemen- 
tal Figure S4C). Also note that the overall elongation rate 
increases with increasing cleavage rate, as is observed ex- 
perimentally [3TJ |32J H7] . We next introduce an efficiency 
parameter for PIP, 

Vpip = 1 - r PiP, 

which is in the absence of PIP and 1 for perfect 
PIP. Finally, we parameterize the nucleotide efficiency of 
the transcription process by the ratio of final transcript 
length and the average number of nucleotides consumed 
in its production. This ratio is given by the simple ex- 
pression (see supplemental information) 

?7NTP = 1 - fcbt/fceat- 

The measure is unity without PIP, and vanishes at stall 
(Vol = 0). 

Figure [5] shows the three efficiency measures rj e \, rjpip 
and t^ntp as functions of the backtracking rate &bt 
(within the operational range < fcbt < &cat ~ 10), 
for an otherwise typical polymerase. We see that while 
transcription velocity and nucleotide efficiency correlate 
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positively, they both correlate negatively with fidelity, 
directly illustrating the cost of enhancing fidelity. This 
hints at an underlying competition, which we now explore 
by considering transcription of extended sequences. 
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FIG. 5: Polymerase performance. Proofreading effi- 
ciency ?7pip (red dot-dashed), elongation efficiency 77^ (black 
solid) and nucleotide efficiency j)ntp (blue dashed) as a func- 
tion of the backtracking rate, for an otherwise typical poly- 
merase with fccat = 10/s and fc c i v = 0.1/s. Values indicated by 
diamonds were obtained numerically, through Gillespie simu- 
lations. 



B. Performance on the level of the gene 

Here we demonstrate that a moderate rate of back- 
tracking is necessary for rapidly generating transcripts 
with few mistakes from extended sequences. This be- 
comes apparent when noting that the longer the se- 
quence, the less likely it is for a polymerase to pro- 
duce an error- free transcript. It is instructive to intro- 
duce the probability Pi of producing a long error-free 
sequence [53] of length For each attempt, the proba- 
bility of transcribing a sequence of length I without an 
error is given by Pi(r) = (1 + r)~ l ~ exp(— Ir), with 
r = rpiprpig representing the total error fraction. The 
production-rate gain Xel on extended sequences is ob- 
tained by comparing the rate at which error-free tran- 
scripts are produced with PIP, to the rate with which 
they are produced without PIP (fcbt = 0). Thus, 
Xci = VeiPi (rpisrpip) I Pi (rpi S ) ~ ryoiexp(/rpisr?pip). 
Similarly, we introduce the NTP-efficiency gain on ex- 
tended genes xntp by comparing the number of error-free 
transcripts produced per nucleotide used with and with- 
out PIP, giving xntp = ^ntp-P; (rpisrpip) /Pi (rpis) - 
t?ntp exp(Zrpis?7pip). From both these quantities it is 
clear that even moderate PIP provides enormous gains 
in the rate of perfectly transcribing long (I > 1/rpis) se- 
quences. With the two sequence-wide measures that we 
have introduced, it is now possible to address transcrip- 
tional efficiencies on the level of transcription of whole 
genes. As an example we consider a sequence of a length 
comparable to the typical human gene I — 10 4 bp, and 
in Figure [BJ'V we plot the efficiencies Xel and xntp as a 
function of the backtracking rate fcbt (within the oper- 
ational limits < fcbt < fccat = 10/s). Each measure 
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FIG. 6: High fidelity transcript production. A) On 

the left vertical axis we mark the production-rate gain on ex- 
tended sequences \el as a function of the backtracking rate 
(black solid fine). On the right vertical axis we mark the 
NTP-efficiency gain Xntp as a function of the backtracking 
rate (red dashed line), all for a sequence of length I = 10 4 bp. 
The region between the two peaks is where one might expect 
the optimal value of fcbt to lie. Note there is a gain of 13 
orders of magnitude in the rate of producing error-free tran- 
scripts when transcribing with PIP as compared to without 
PIP, with similar gains in nucleotide efficiency. B) The back- 
tracking rate that optimizes the production-rate gain (black 
solid) or the energy-efficiency gain (red dashed) as a function 
of sequence length. Gray shading indicates a region of com- 
promise between both gains. Inset, a magnification of the 
region around fcbt = 1/s indicates that PIP is optimal with 
fcbt = 1/s for gene lengths of 10 4 — 4 • 10 4 bp. The vertical 
blue line indicates the sequence length used in A. 



has a definite optimal value, and we see that the gains in 
both rate of perfect transcript production and nucleotide 
efficiency can be enormous, here reaching thirteen or- 
ders of magnitude. If RNAP was optimized to transcribe 
this particular sequence length, then we would expect the 
true value of the backtracking rate to lie somewhere in 
the intermediate region between the peaks: representing 
a compromise between NTP efficiency and production 
rate. For the intimidate value of fcbt = 1/s — coinciding 
with our estimate of the physiologically relevant back- 
tracking rate — it would take a polymerase of the order 
of one hour to produce an error free transcript, which 
should be compared to 10 13 hours without PIP. 

Finally, it is interesting to ask how the region of op- 
timal backtracking rate changes as the transcribed se- 
quence length varies. Figure [6)3 shows the fcbt that 
optimizes Xci (black solid line) and xntp (red dashed 
line) as a function of sequence length I. The inset in 
Figure [6j3 highlights the backtracking rate for our typ- 
ical polymerase (fcbt = 1/s), and the implied sequence 
lengths (~ 10 4 — 4 • 10 4 bp) for which this backtracking 
rate would be optimal. A complete discussion would need 
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to consider relaxed fidelity constraints due to e.g. codon 
redundancy [48], but considering that the average gene 
length in eukaryotes lies in the range 10 4 — 10 5 [T], it 
is thought-provoking to speculate that the moderate ob- 
served backtracking rates of around 1/s are the result of 
an evolutionary optimization for rapidly and efficiently 
producing functional transcripts from genes in the tens- 
of-kbp range. 

III. DISCUSSION 

By analytically studying a model of backtracking cou- 
ple to chain elongation and cleavage, we have shown that 
irregular transcription dynamics is likely a result of main- 
taining transcriptional efficiency, not at the level of in- 
dividual nucleotides, but rather, at the level of extended 
sequences and genes. Our work suggests that proofread- 
ing relies on an entropic enhancement of fidelity, where an 
extended state space reduces the chance of spontaneous 
recovery. This ensures low error rates even with low rates 
of transcript cleavage. Through backtracking, an incor- 
porated error can be proofread at least twice through bi- 
asing the entry into backtracks, but could in principle be 
proofread as many times as there are bases in the RNA- 
DNA hybrid within the elongation complex. To what 
extent there are additional proofreading checkpoints be- 
yond the two discussed here is an interesting line of future 
research, providing a potential link between the structure 
of the elongation complex and overall transcriptional ef- 
ficiency and fidelity. Such work might offer additional 
clues as to why the RNA-DNA hybrid has a length of 
about 8-9 bp [39] , 

Considering both the effects of proofreading on NTP 
consumption and the production rate of extended func- 
tional transcripts, our investigation suggests that the 
internal hopping rate in the backtracked state is not 
optimized for fidelity alone. Instead, it is kept mod- 
erate in order to enable rapid production of extended 
transcripts that are of high fidelity. That there will 



be many more backtracks than there are errors to re- 
move is a direct consequence of undetected errors being 
costly, since they have the potential to render the whole 
transcript dysfunctional. A certain level of paranoia is 
thus desirable on part of the polymerase. Even though 
such paranoia decreases the instantaneous average tran- 
scription rate, the observed level of backtracking — 
perhaps counterintuitively — drastically increases the rate 
at which high fidelity transcripts are produced. Inter- 
estingly, the backtracking rate and the amount of back- 
tracks in cells of a particular organism would be expected 
to correlate positively with the sequences length that 
has induced the highest evolutionary pressures on tran- 
scription (see Figure [6j3, gray region). In other words, 
genomes with genes of increasing length should be tran- 
scribed with increasingly irregular dynamics to maintain 
transcriptional efficiency. It would be interesting to de- 
termine if an overall trend in backtracking rate [15] 139] , 
and consequent irregularity of dynamics, could be found 
for polymerases originating in organisms with varying ge- 
netic complexity. 

To conclude, our model highlights the enormous gains 
offered by post-incorporation proofreading when tran- 
scribing long sequences, illustrating how important this 
basic mechanism has become for the sustenance of life. 
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